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Abstract: Network tomography has been regarded as one of the most promis- 
ing methodologies for performance evaluation and diagnosis of the massive and 
decentralized Internet. This paper proposes a new estimation approach for 
solving a class of inverse problems in network tomography, based on marginal 
distributions of a sequence of one-dimensional linear projections of the ob- 
served data. We give a general identifiability result for the proposed method 
and study the design issue of these one dimensional projections in terms of 
statistical efficiency. We show that for a simple Gaussian tomography model, 
there is an optimal set of one-dimensional projections such that the estimator 
obtained from these projections is asymptotically as efficient as the maximum 
likelihood estimator based on the joint distribution of the observed data. For 
practical applications, we carry out simulation studies of the proposed method 
for two instances of network tomography. The first is for traffic demand tomog- 
raphy using a Gaussian Origin-Destination traffic model with a power relation 
between its mean and variance, and the second is for network delay tomogra- 
phy where the link delays are to be estimated from the end-to-end path delays. 
We compare estimators obtained from our method and that obtained from us- 
ing the joint distribution and other lower dimensional projections, and show 
that in both cases, the proposed method yields satisfactory results. 



1. Introduction 

Network performance monitoring and diagnosis is a challenging problem due to the 
size and the decentralized nature of the Internet. For instance, when an end-to- 
end measurement indicates the performance degradation of an Internet path, the 
exact cause is hard to uncover because the path may traverse several autonomous 
systems (AS) that are often owned by different entities, e.g., a service provider, and 
the service providers generally do not share their internal performance. Even they 
do, there is no scalable way to correlate the link level measurements to end-to-end 
performance in a large network like the Internet. Similarly, the service providers 
may be interested in the end-to-end path characteristics that they can not observe 
directly. 

Network tomography is a technology for addressing these issues [l|,i,i,E| (see 
for an excellent review of this topic). The key advantage of network tomography 
is that it does not require the collaboration between the network elements and the 
end users. There are two main classes of problems being studied in the literature. 
The first estimates the link-level characteristics, such as packet loss or delay based 
on end-to-end measurements [Til l . The loss tomography problem 
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can be viewed as a special case of the delay tomography problem if loss is treated as 
a very large delay. Here we consider the case where packets are transmitted using the 
multicast protocol, that is, a packet is sent from a source to multiple destinations 
simultaneously. The second is traffic demand tomography where one predicts end- 
to-end path-level traffic intensity based on link- level traffic measurements [5j, iS, 2lU ■ 
For both network delay tomography and traffic demand tomography, the statistical 
models can be unified by 

(1) Y = ^X, 

where X = {Xi, . . . ,Xj)'^ is an /-dimensional vector of unobservable network dy- 
namic parameters with mutually independent components, and Y (Yi, . . . ,Yj)^ 
is a J-dimensional vector of measurements and A is a J x / routing matrix with el- 
ements or 1. The objective is to estimate the distribution of X given independent 
observations from the distribution of Y. 

As with other inverse problems, the main difficulty in network tomography lies in 
the fact that I > J. For the traffic demand tomography / can be as large as , and 
for the delay tomography model, / can be as large as 2J — 1. As a result, A is not 
invertible and the tomography problem is ill-posed. However, if components of X are 
assumed independent, it can be shown that the distribution functions of X under 
model ([1]) can be uniquely determined up to their means under mild conditions 
. This mean ambiguity can be further removed if the component distributions of 
X satisfy some additional constraints, for example, a relation between their means 
and higher order moments (such as the Poisson distribution), positive point mass 
at zero etc jsl, [lil, [3, |2l| . 

To estimate the distribution of X, likelihood based inference has been proposed 
[1, However, since the likelihood involves a high order convolution, such 

inference is computationally expensive except for specific distributions. This can be 
limiting in some circumstances. For example, in delay tomography where the link 
delays are to be estimated, the continuous distributions for both end-to-end and link 
delays are approximated by non-parametric discrete distributions using the same 
set of bins of equal widths. This is problematic for a heterogeneous network such as 
Internet where the link delay distributions can differ significantly. To overcome the 
computational difficulties in likelihood based inference, a characteristic-function 
based generalized moment estimator has been proposed for general distributions 
0, where the model parameters are estimated by minimizing a contrast function 
between the empirical characteristic function and the parametrized characteristic 
function of Y under the model. 

However, it has been realized that either the full likelihood or joint characteristic 
function based inference may still be expensive when the dimension of X is high. 
A solution to this is the pseudo-likelihood approach proposed in [l6l |. where the 
parameters are estimated by minimizing a pseudo-likelihood function that is con- 
structed by multiplying the marginal likelihoods of a sequence of subsets Y* of the 
high-dimensional observation Y. The idea is that these marginal likelihoods only 
focus on a small subset Y'* and thus are computationally much cheaper. Specifically, 
the authors considered constructing these subsets using all pairwise components of 
Y, i.e., Y'* = (^1)^2)1 1 < ji < j2 < J, and found that such a pseudo-likelihood 
based estimator is computationally efficient as compared to the estimator based on 
full likelihood meanwhile has little loss in statistical efficiency. The same idea has 
been used in the characteristic-function based estimator in [9| by considering char- 
acteristic functions of a sequence of low dimensional subsets Yg, and in the local 
likelihood estimator in [15] by considering Y^ of both one and two dimensions. 



Method of ID projections 



47 



In all the above approaches, each Y'' can be considered as a projection of a high 
dimensional measurement Y onto a low dimensional subspace, taken along the axis 
of components of Y. One can further generalize this and take projections in arbi- 
trary directions, for example, using BJY where Bg is matrix with a small number 
of columns. However, an optimal choice of these lower-dimensional projections so 
as to balance the computational complexity and statistical efficiency has not been 
studied previously. For statistical efficiency, one might want to use relatively high 
dimensional projections so that the information on multivariate dependency will 
not be lost. For computational efficiency, one might prefer to a small set of lower 
dimensional projections. 

This paper provides a partial solution to the design issue of these lower di- 
mensional projections. Here we consider the extreme case - one-dimensional (ID) 
linear projections of the observed data Y. That is to say, the statistical inference 
regarding the distribution of X is based on the marginal distributions of a series 
of ID-projections of Y, say /3jY, /3k G TZ'^ ioi- k — 1, . . . , K. The contributions of 
this paper are two-fold. First, we give a sufficient condition for the choice of these 
ID-projections so that the X distribution can be uniquely determined. Second, we 
study the design issue of such ID-projections in terms of statistical efficiency. For a 
simple Gaussian tomography model where component distributions of X are Gaus- 
sian, we show that there exists an optimal choice of ID-projections, selected by a 
correlation based rule, from which the obtained estimator is asymptotically as effi- 
cient as the maximum likelihood estimator (MLE) based on the joint distributions 
of Y. For more realistic tomography models, we carry out simulation studies of two 
instances: the first is for traffic demand tomography where the Origin-Destination 
traffic is also Gaussian but its mean and variance are related through a power equa- 
tion, and the second is for network delay tomography where the link delays are to 
be estimated using a continuous mixture distribution. For both cases, we show the 
proposed method based on ID-projections yields satisfactory results as compared 
to estimators using other choice of projections and the complete data. 

The remaining of the paper is organized as follows. In Section 2, the method of 
ID-projections is proposed, and the identifiability issue and parameter estimation 
are discussed. In Section 3, a simple Gaussian tomography model is analyzed, and 
the optimal design of ID-projections and its efficiency are studied. In Section 4, 
simulation studies of the ID-projection method are presented for traffic demand 
tomography and network delay tomography. We conclude in Section 5. 

The following conventions will be used throughout the paper. ID-projectio- ns 
represent one-dimensional projections with the form /3^Y. 2D-projections represent 
pairwise components of Y, e.g. (Yj, Yji). A lower case letter represents a vector and 
an upper case letter represents a matrix, with the exception of X and Y, which 
represent random vectors as in model ([T]). Mab or M{a,h) represents the (a,6)th 
element of a matrix M . Vi is the ith element of a vector v and Pk & Ti-'^ is a column 
vector and (3ki is its ith element. Af and represent the transpose of M and v 
respectively. M~-^ is the transpose of M~^, the inverse of M. 

2. Method of one-dimensional projections 

In this section, we formally describe the method of ID-projections for solving the 
inverse problem in ([1]) in the context of network tomography. We first present a 
necessary and sufficient condition for identifiability and then discuss the parameter 
estimation in this setting. 
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2. 1 . Identifiability 

One fundamental question of the ID-projection method is whether the distribu- 
tion of X can be uniquely determined by the marginal distributions of these ID- 
projections. This is the so-called identifiability issue. For simplicity we shall start 
with a simple matrix A derived from the two-leaf tree delay tomography model 
(Figure [T]), and then generalize it to an arbitrary matrix. For illustration purpose, 
we use a special set of ID-projections on the two-leaf tree to explain the main idea 
behind the identifiability. For the two- leaf tree in Figure [TJ let X = (Xi,X2,X3)-^ 
be the internal link delays from top to bottom and left to right, and Y — (Yi, ¥2)^ 
be the end-to-end delay from the root node to the two leaves from left to right. 
Since the end-to-end delay is the sum of internal link delays on the path, we can 
write Yi= Xi+ X2 and Y2 = Xi + X3. That is, A = [1, 1, 0; 1, 0, 1]. Following 9], 
we assume that the characteristic function of each component of X is analytiqj and 
we refer to this as the analytic condition in this paper. 

Lemma 2.1. For the two-leaf tree in Figure]^ assume that X has mutually in- 
dependent components and satisfies the analytic condition. Then the distribution 
of X is determined up to a shift in its mean by the marginal distributions of 
Yi,Y2,Yi + aY2 if a ^ and a 7^ —1, where the mean of "X. satisfies the constraint 
E[Xi] + E[X2] = E[Yi] and E[Xi] + EiX^] = E[Y2]. 

Proof. Let /3i = (1,0)-^, /32 — (0, 1)"^ and (3^ — (l,a)-^, then the three projections 
can be written as /Sj Y, k = 1,2,3. Let -jk = A'^Pk = ilki,lk2,Jk3V , then the 
characteristic function of f3^Y is equal to 

3 

i=l 

where ipi is the characteristic function of Xi. Suppose that there exists another 
set of characteristic functions ■01 which also satisfy the above three equations. Let 
uJi{t) = \og^pi{t) — log?/5i(i). Then we have for k = 1,2,3, 

3 

i=l 

Let din be the nth order derivative of uji{t) at t = 0. By evaluating the nth order 
derivatives at zero, we have for k = 1,2,3, 

3 

i=l 

For each n > 2, let M„ be the coefficient matrix of the above linear equations of 
{din : * = 1, 2, 3}. Since A = [1, 1, 0; 1, 0, 1], we have 

Mn = 



^ An analytic characteristic function corresponds to a distribution function which has moments 
rrij. of all orders k and lim sup^.^^ [|r?ij.|/fc!] is finite. 



1 1 
1 1 

(1 + a)" 1 a" 
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Fig 1. Two-leaf tree 



Fig 2. Four- leaf tree 



Fig 3. One router network 



The identifiability problem is equivalent to asking whether din = {i = 1,2,3), 
i.e. det(M„) ^ 0, for all n > 2. Note that det(M„) = (1 + a)" - a" - 1. Let 
fix) = (1 + x)" - x" - 1. Thus fix) = + a;)"-i - 

If n > 2 is even, f'{x) > and thus f{x) is monotone with a unique zero x = 0. 

If n > 2 is odd, f'{x) has a unique zero x = —1/2 and thus f{x) has two zeros 
X = and a; = —1. 

Under assumptions of a 7^ and a 7^ —1, det(Af„) — f{a) 7^ 0, and thus din — 
{i = 1,2,3) for all n > 2. Hence each uji{t) is a linear function. The conclusion 
follows readily. □ 

Lemma [2.11 states that if chosen properly, only three ID-projections are needed 
to determine the distribution of X (ignoring its mean) . The condition a 7^ merely 
says that the third projection Yi + aY2 does not coincide with either Yi or ¥2- 
Obviously, from the proof we can see that the condition a 7^ —1 is not needed in 
order to identify the even order cumulants of each Xi, i.e. the even order derivatives 
of logipiit) ai t — 0, but a 7^ —1 is required in order to identify the odd order 
cumulants. 

Remark. It is not hard to show that all even order cumulants can be determined 
by the marginal distributions of arbitrary three distinct^ projections ajYi + bjY2, 
j = 1, 2, 3. But some further constraints, i.e. aj +bj 7^ {j ~ 1, 2, 3) which is similar 
to the condition a 7^ —1 in the above lemma, are needed in order to determine the 
odd order cumulants except the first order. Lemma l2 . 1 1 provides a simple set of ID- 
projections which can be used to identify the X distributions. This can be easily 
extended to the case where A corresponds to a tree topology. For a general matrix 
A, the identifiability issue is addressed by the following theorem. 

Theorem 2.1. For a general tomography problem Y — Ali. introduced by {Ip, 
let /3jY, k = 1,...,K, be K ID-projections of \ onto the linear space of its 
components. Let M be a K x I matrix whose rows consist of (f[ A, and let M„ be 
a matrix whose elements are the nth power of the corresponding elements of M . 
Then the distribution of X can be determined up to the ambiguity of its mean by 
the marginal distributions of f3jY, k = 1, . . . , K . if and only if M„ has full column 
rank for all n>2, where the mean ofK satisfies the constraint Ai?[X] = -E[Y]. 

The proof of Theorem 12.11 follows the same idea as that of Lemma 12.11 The 
details are omitted to avoid technical redundancy. 

Remark. Theorem 12.11 provides a sufficient and necessary condition for the iden- 
tifiability of the X distribution by using the marginal distributions of a set of 



^Two projections are the same if one is a scale multiplication of the other. 
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ID-projections of the observed measurements Y. Unfortunately, the fuU column 
rank condition on M„, for any n > 2, is hard to verify for an arbitrary set of pro- 
jections. Further, it is worth pointing out that for identifiability, K > I is required 
since each M„ has / columns. However, when A satisfies some sufficient conditions 
such that the distribution of X can be identified from the joint distribution of Y 
(see for discussions of such A matrix), then we would expect that in most cases, 
the marginal distributions of a set of K ^ I properly chosen ID-projections of 
Y can be used to identify the distribution of X. This is due to the fact that by 
solving the polynomial equations, i.e. det(M„) = 0, the set of projection directions 
{f3k '■ k = 1, . . . ,/} (ignoring the scales) which violates the full rank condition has 
Lebesgue measure zero. 

2.2. Parameter estimation 

Let /(X|6') be the distribution of X with unknown parameter 9. We consider two 
estimators of 9 derived from the marginal distributions of ID-projections {/3^Y, k — 
1,...,K}, one based on their marginal likelihoods and the other based on their 
marginal characteristic functions. In later sections we shall give examples for each 
of these for instances of network tomography problems. 

2.2.1. Likelihood based inference 

Provided that it is easy to evaluate the univariate distribution of /3^Y, the 
KuUback-Leibler divergence between the empirical and model distributions of each 
/3jY, fc = 1, . . . , if can be used to obtain an estimator of 9. Let P„ be the empirical 
distribution of Y based on n i.i.d. samples and Zfc(-, 9) be the logarithmic likelihood 
function of P'^Y . Similar to the maximum pseudo likelihood method in [l6l |. an 
estimator of 9 based on the ID-projections can be defined by 

K 

(2) 9^D = argmin^ / -hiPlV, ffjdP^. 

k=i'' 

Usually there is no closed form solution to 9lo^ and the pseudo EM algorithm in 
[iQ] or other numerical optimization algorithms may be used. 

2.2.2. Characteristic function based inference 

Often the distribution of /3j Y is hard to evaluate since it is a high order convolution 
of the distributions of {Xi, . . . , Xj). In this case, as proposed in [91, the generalized 
method of moments (GMM) based on characteristic function [7|, [iJI can be used 
to obtain an estimator of 9. Let ipi{-,9) be the characteristic function of Xi and 
(pk{', S) be the empirical characteristic function of /3jY. Let A^ be the zth column 
of A. Then the characteristic function of /?JY, say fk{'i 9), is equal to 

/ 

Vk{t,9) = Ee^'"^^ =l[Uf3lA%9). 

i=l 

Now an estimator of 9 based on characteristic functions of the ID-projections can 
be defined as follows: 

^ r 

(3) 9cF = argmin 

k=l 
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where /i(-) is a predetermined distribution measure on TZ. In general, ([3]) may be 
solved numerically. For computational convenience, we may use an empirical dis- 
tribution of /i in the above as an approximation. When each Xi follows a mixture 
distribution, i.e. ipi{t) — X^fePfe^KO' Pk — 0' J2kP'k — where ^'j. are characteris- 
tic functions of corresponding mixture components, and 6 consists of the coefficient 
parameters p^, then for each i, the above objective function is a quadratic function 
of {p±}- Thus the optimization can be done iteratively by quadratic programming 
(see [9| for details) . Asymptotic properties such as consistency and asymptotic nor- 
mality have been studied for this estimator under mild conditions 7] . An improved 
estimator that takes into account the correlation of the empirical characteristic 
function ipk at different points t can also be considered following similar techniques 
developed in [9|. 

3. Optimal design of ID projections for the Gaussian tomography 
model 

In Section [21 it has been shown that the distribution of X can be identified using 
the marginal distributions of a set of K (K > I) properly chosen ID-projections. In 
this section, we consider the optimal design of these projections. We consider the 
two main factors: 1) the statistical efficiency of the estimators based on these pro- 
jections and 2) the computational complexity determined primarily by the number 
of such ID-projections, i.e. K . To achieve optimal design, we first consider a sim- 
ple Gaussian tomography model defined below and investigate the design issue in 
depth. Surprisingly, we found that for this simple model there is a minimal set of / 
ID-projections such that the estimator based on these projections is asymptotically 
as efficient as MLE. Such a set of projections constitute the optimal design. 

3.1. The Gaussian tomography model 

The Gaussian tomography model is defined by the tomography model in ([T]) when 
the components of X have mutually independent Gaussian distributions, i.e., Xi ~ 
JV{iii, ^i), i = 1, . . . , /, and Af{iii, 9i) stands for the normal distribution with mean 
jii and variance 9i. Notice that for a set of ID-projections of Y that satisfies the 
condition of Theorem 12.11 8 = (6*1, ... , 0/)'^ is identifiable but (/ii, . . . , /i/)"^ is not 
because of the mean ambiguity. For simplicity assume that = for i = 1, . . . , /. 
The Gaussian tomography model is defined as 

(4) Y-AA(0,AeA^), 

where 9 is the parameter of interest and is a diagonal matrix with diagonal ele- 
ments 9i. Since the distribution of /3jY is also Gaussian and thus easy to evaluate, 
given a set of ID-projections, we use the likelihood based method given by ([2]) in 
Section 2 to estimate 9. We investigate its statistical efficiency as compared to that 
of MLE, denoted as 9mle- 

Let S = AQA^ , and let indicate convergence in distribution. The following 
lemmas state the asymptotic distributions of the estimators 9mle and 9id- 

Lemma 3.1. Suppose that 9 is identifiable. Then 

where Ip is the Fisher information matrix with elements Xp{a,b) — \U^f^ and Uab 
is the {a,b)th element of U = A^Yi^^A. 
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Proof. The log-likelihood function of Y can be written as 

\ogp^iy;e) = -^Y^E-^Y- ilog(2^det(E)), 

where E — AQA^ . Notice that E = X)j=i OjA' A'^ ^ by taking partial derivatives 
w.r.t. each 0^, we have 

where vec{M) vectorizes M. So the score fmictions, say Si,i — are 

s,(Y) ^ - A iogp(Y; 0) = i {-(Y^E-iA')2 + trace(E-i(AM^^))} , 
where trace{M) denotes the trace of M . Hence the result follows from 



(.,(Y),.fe(Y)) = icot;((Y^E-M^-)2, (Y^E-M'=f ) 



= ^{coi;(Y^ E-M^ Y^ E-M^)}^ 
==i(v4J"^E-iA'=)2. 

The second equality in the above uses the fact that for any bivariate Gaussian 
random vector (iVi, TVa), cov{Nl,N^) = 2[cow(7Vi, iVa)]^. □ 

Lemma 3.2. For a set of ID-projections {/3'^Y,k = 1, . . . ,K}, let 7^ — A'^(3k, 
and af. — 0^Yil3k- Suppose that 9 can he identified by these ID-projections. Then 
the likelihood based estimator 6id defined by (0) using these projections has an 
asymptotic distribution 



(5) 



where Cm —2^^ ^''^d Xiu 



^V^WV . Here V is a K x I matrix with ele- 



ments Vka — '^k'^l'ka '^'^'^ W is a K X K symmetric matrix with elements Wkk' ~ 
aj^'^ aj^i'^ (P'^'SPk')'^ ■ Furthermore, if K — I and V is invertible, then the limit co- 
variance matrix in 0) simplifies to 2V^^WV^^ . 



Proof. Notice that 



Let lk{-,0) be the marginal logarithmic likelihood function of /3jY, that is, 



By the classical theory on M-estimators, we have 



(6) 

where 
(7) 



K 



-ID 



fc=i 

K 



k=l 



dik{p^Y,e)\ (d ikiPlY.e) 
de 
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and 

(8) 



^iD ~ var 



' K 

E 
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It is not hard to verify that 



Thus 



and 



86. 



dal 86, 



K 



CiD{a,b) =E|^ 



4 2 2 

k '^kal'kb 



K 



k,k' = l 
K 



k,k' = l 



Hence, do = hV^V and 1^ = W^WV. 



□ 



5.3. Optimal projections 

Given the asymptotic variance Cj^XidCj^ of the estimator duj ([H [7] and [5]) , it 
is not obvious how one can choose a set of ID-projections so that the asymptotic 
covariance matrix can be minimized. By ([7]) Cid is the summation of the information 
matrix of individual projection /3jY. Note that cow((/3fcY)2, {p^Yf) > 0. By dH), 

this imphes that the score functions of individual ID-projections, i.e. ^''"''^g^'^'' 
(fc — 1,...,K), are positively correlated. Intuitively, the directions Pk should be 
chosen such that the scores and thus the projections /3jY are mutually uncorrelated 
as much as possible from each other. To be more precise, reduce Xijj significantly 
while keep Cid stable. Since each projection is a linear function of the components 
of Y and the individual components of X are mutually independent, thus there 
will not be much information overlap if the projections are chosen such that each 
is close to an individual component of X. This motivates the following design for 
the Gaussian tomography model. 

Take K = I. For each fc € {!,...,/}, pick /3fc such that the correlation between 
/3jY and Xk, say corr{P'[Y , Xk), is maximized. Since 



corr{Xk,(3'Y) 



^var{Xk)var{pTY) ^/3Tcot;(Y)/3 

the scale on /3 does not affect the correlation. Furthermore, notice that the scale does 
not affect either Cid or Iid- By assuming var{0^Y) — 1, Pk can be determined : 

(9) /3fc = A^lI]-lA^ 

where \k = {A'^'^T.^^A'^y/^. Theorem 13.11 below shows that the projection direc- 
tions chosen by the above correlation rule leads to an efficient estimator of 6. 
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Theorem 3.1. Under the same condition as Lemma \3. 11 if the ID-projections are 
taken by the maximum correlation rule given by 0), then for the simple Gaussian 
tomography model, the ID-projection estimator given by (0) is asymptotically as 
efficient as MLE. 

Proof. By (H)), 7/ca = A^tM^^S^M''. Plugging this into JS]), it is not hard to verify 
that 

V ^2S-^Xf, 

and 

W = 2S-^1fS-^, 

where If is the same as in Lemma l3. II and 5* is a diagonal matrix with diagonals A|, 
A; = 1, . . . , /. Thus V is invertible and the limit covariance matrix in ([5]) simplifies 
to Ip"^. The result follows. □ 

Notice that the optimal ID-projections /?feS in Q depend on unknown parameters 
9 and thus are unavailable. Fortunately, (3k only depends on S and the sample 
covariance of Y, say S, is a -^/n-consistent estimator of E. Thus we can plug-in S and 
obtain empirical estimates of Pk- By assuming that 9 belongs to a compact subset 
of 7?,^, from the theory of generalized M-estimators in this plug-in ID-projection 
estimator is asymptotically efficient. We omit the tedious technical verification but 
refer to Chapter 7 of for details. 

More realistic Gaussian tomography models assume a mean-variance relation- 
ship and thus the above theorem may not hold. But simulation studies in the next 
section suggest that the above plug-in ID-projections still work better than arbi- 
trary projections and the performance of the corresponding estimator is close to 
MLE. 



3.3. Comparison with other projections 

We now compare the statistical efficiency of 9id based on the optimal set of ID- 
projections given by Q with that based on other choices of projections. We consider 
two such choices. The first is based on the set of all pairwise 2D-projections of Y 
that are proposed in [l6j, called ^2_d, the second is based on a set of K randomly 
chosen ID-projections while adjusting for the correlation of Y, i.e., each random 
projection is generated by 

(10) PlY = a^E-i/^Y, 

where au is drawn independently from the standard multivariate Gaussian distri- 
bution with an identity covariance matrix. 

Let 92D be the estimator of 9 based on all pairwise 2D-projections of Y and 
let f{{Yk, Yk')\9) be the distribution of the bivariate variable (Yfe, Yfe'). Then 92d is 
defined by 

(11) 92D = B.rgmmf ^ -log f{{Yk,Yk-)\9)dP^. 

l<k<k'<J 

Following similar arguments as in Lemma 13.21 it can be shown that 

(12) V^(0"2D -0)^dM (0, C-^I2dC2d) > 
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Fig 4. The limit standard deviations of y/n{9 — 9) for four estimators for the 16 variance pa- 
rameters di: the two random ID-projection estimators with K = 21 = 32 and K = 10/ = 160 
projections (reporting medians of 100 replications), 820 ^.''^d, the optimal 9i£). A is given in H3\} 
below. The x-axis represents the values of 16 specified variance parameters, 9i,i = 1, . . . , 16. 



where 

k<k' 



k<k' ,l<l' 



and S^;, 



is a 2 X 2 matrix consisting of the elements of E. 



Since it is hard to evaluate in closed form the expectation of the asymptotic 
covariance matrix of Bid using random ID projections, we use simulations to study 
its performance. In the simulation, we use a 7 x 16 A matrix, shown in (|13|) below, 
derived from a later simulation study of a traffic demand tomography problem on 
a one-router network in Figure [31 That is, / = 16 and J — 1 . The parameters 
— {61, ... , 6iq)^ are generated i.i.d. from the exponential distribution with mean 
1, and remain fixed throughout the simulation. In each simulation run, we randomly 
draw K ^ 21 and K = 10/ ID-projection directions /3fc as in (fTU)) . and then 
calculate the limit covariance matrix of these two estimators using ([5]). We then 
replicate this procedure 100 times. For comparison, we also calculate the limit 
covariance matrix of \fn{B\j:) — &) for the optimal ID-projections using ([9]) (same as 
that of \pn(BMLE ~ 0)), and that of y/n{02D — 6)- Figure [4] shows the limit standard 
deviations of the 16 variance parameters 9i for four estimators: the two random ID- 
projection estimators (reporting medians of 100 replications), 620 and the optimal 
9iD- The plot shows that the 2D-projection estimator is not efficient asymptotically 
by comparing with the optimal one or MLE. The plot also suggests that as the 
number of projections grows, the limiting covariance matrix for the random ID- 
projections converges to the optimal one but we leave it to future studies. 
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4. Simulation studies for realistic models in network tomography 

In Section[3l we have shown that for the simple Gaussian tomography model ([4]), the 
estimator Oijj by using the optimal set of ID-projections of Y is asymptotically as 
efficient as MLE (Theorem 13. II) . For other models, however, it is not clear whether 
the ID-projection method can lead to an efficient estimator since a linear structure 
inherent in the Gaussian model no longer exists. In addition, the correlation rule 
which gives an optimal set of ID-projections Q may no longer be optimal. Since 
theoretical investigations of efficiency are difficult for general models, we resort to 
simulations to study the performance of the ID-projection method (i.e. 9id and 
Ocf), especially using ID-projections that obtained from the correlation rule 
We study the performance of the method under two realistic models in network 
tomography. The first example is for traffic demand tomography |2l[ with Gaus- 
sian OD traffic model where the mean and variance of traffic counts is related by 
a power equation and the second example is for delay tomography 
with mixture models for link delay distributions. We demonstrate that in both cases 
the ID-projection method yields satisfactory results. 



4.1. Traffic demand tomography using the Gaussian OD traffic model 

Let X = {Xi, . . . , Xj)^ be the Origin-Destination (OD) traffic counts in a network. 
In traffic demand tomography, we observe Y — AX., where Y is the measured 
link traffic counts collected for instance using SNMP, and A is the network routing 
matrix with elements or 1. The problem is to estimate the distribution of each 
Xi from independent samples of Y. Following Q, assume a Gaussian OD traffic 
model with a power relation between the variance and mean of traffic counts, i.e., 
each (z = 1, . . . , /) is an independent Gaussian random variable with var{Xi) — 
(j){EXiY, where (j) is an unknown scale parameter and c > is a known power 
exponent. Let 9i = EXi be the mean counts of the ith OD traffic. The parameter 
of interest is 9 — (^i, . . . , 0/)"^ and (j) is an unknown nuisance parameter. 

In the simulation, we use the same one-router network with four attached in- 
put/output links as in [5|, reproduced here in Figure [3l For this network, there 
are a total of 16 OD pairs from all pairwise combinations of an input and output 
link. For a certain arrangement of the 16 OD pairs, as described in [H, the routing 
matrix A can be written as a 7 x 16 matrix that has entries of except in the places 
indicated below 



(13) 



fill 

1111 

1111 

1111 
1111 
1111 
1111 



For each OD pair Xi, we generate the mean traffic rate 9i independently from a 
lognormal distribution with mean=10 and sd=.95 in the log scale. The variance 
of the traffic rate is generated from a mean-variance relation var{Xi) = 10000^, 
i.e., the scale parameter (j) = 1000 and power exponent c = 1. These parameters 
are chosen to be compatible with the real data observed on the same one-router 
network in 
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Fig 5 . The median estimation errors of mean traffic rates for the 1 6 OD pairs from 1 00 simulation 
runs. The estimation error is measured by |log9i — log 6^1, i = 1, . . . , 16, The four estimates 
are: moment estimator (MM), estimates obtained from the marginal likelihoods of 2D-projections 
(2D), from the marginal likelihoods of ID-projection method using the optimal set of projections 
in {9p (ID), and MLE. The x-axis represents the specified mean traffic intensity values for the 
16 OD pairs in a log based scale. Details of the simulation setup are described in Section\4. 1\ 



In each simulation run, we generate n = 1000 independent samples of OD traffic 
counts and estimated the mean OD traffic rates from the resulting link traffic counts. 
We consider four estimators for performance comparison: MLE, likelihood based 
estimator obtained from the correlation-based ID-projections, likelihood based es- 
timator obtained from all pairwise 2D-projections as proposed in [16] and a moment 
estimator. Similar to that used in 2l|, the moment estimator here is obtained by 
solving a system of over-determined linear equations constructed using the mean 
and variance of the link traffic counts Y. The moment estimator is also used as the 
starting value of the other estimators obtained using numerical optimizations. 

Figure O shows the median estimation errors of mean traffic rates for 16 OD 
pairs from 100 simulation runs. The estimation error for each 6i is measured by 
jlog^i — log 6*^1 for all estimates. The plot shows that the correlation based ID- 
projection estimator performs slightly worse than MLE, but much better than the 
2D-projection estimator and the moment method. We have observed very similar 
results for other parameter settings. This suggests that the correlation based ID- 
projections may be close to being optimal for the Gaussian OD traffic model with 
power mean-variance relation. 



4-2. Network delay tomography using mixture modeling of link delays 

In the context of network delay tomography, one is interested in inferring the distri- 
butions of network link delayqfj from the end-to-end delay measurements, obtained 
either passively or actively. As an example, consider the four-leaf network shown in 
Figure[2] Here 1 = 7 and J = 4. Suppose that at the root of the tree, we send active 



^To be precise, the delay here is the queuing delay that excludes the constant link propagation 
delay, we omit queuing when context is clear. 
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multicast probes to the four leaves, resulting four simultaneous end-to-end delay 
measurements for each probe. Let X — {Xi, . . . jXj)"^ be the internal link delays, 
and Y = (Yi, . . . ,¥4)^ be the end-to-end delay measurements. Then Y = AX., 
where A is the routing matrix with elements or 1 derived from the tree. The net- 
work delay tomography problem is to infer the distribution of X from independent 
samples of Y. 

There have been significant amount of work on network delay tomography 
lol . iH , Til . 2o| . and most of these approaches use likelihood based estimation. The 



likelihood based estimation uses a discrete distribution with equal width bins to 
approximate a continuous link delay distribution, where the same bin width is used 
for all the links. However, for a heterogeneous network such as the Internet, such 
a discretization process will result in an over-parameterized link delay model and 
hence lead to high computational complexity. Recently, a characteristic function 
based estimation approach has been proposed as an alternative approach to accom- 
modate network heterogeneity (0), using a flexible mixture modeling of link delays. 
As described in Section r2. 2. 21 instead of minimizing the KuUback-Leibler distance 
between the empirical and model distribution, by which the maximum likelihood 
estimator is derived, their estimator is derived by minimizing a L2 distance between 
empirical and the model characteristic function. It is also found that the estimator 
derived from comparing the characteristic functions of low dimensional components 
of Y yields better performance, as compared to Y itself, where the difficulty of the 
latter is how to choose an appropriate high dimensional weight function fi. 

In the following, we run simulations of the delay tomography model on the 
four-leaf tree, and compare the performance of 9cf defined in ^ based on ID- 
projections with that based on all pairwise 2D-projections of Y. We do not compare 
the estimates against MLE because MLE is computationally infeasible for the flex- 




FlG 6. The estimated cumulative distribution functions of link delays on a four- leaf tree (Figure\^ 
from 1000 end-to-end delay measurements. The true delay distributions, shown in solid line, are 
those generated from a M/M/1 queue. All three estimates are obtained using the marginal char- 
acteristic function based approach described in Section \2.2.2\ usina a mixture model of piecewise 
uniform of 6 bins and an exponential tail. The estimates are: 2D-projection method (2D), ID- 
projection method using projections in {9p (ID. Cor), and ID-projection method using I random 
projections in {9)) (ID. Rand). Details of the simulation setup are described in Section \4.2\ 
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ible mixture model that we use in deriving these estimates. In the simulation, each 
link delay distribution Xi, i = 1, . . . , /, is generated independently from a queuing 
distribution of an M/M/1 queue of the following form 



P{Xi > x) = Wiexp(— ^x), X > 0, and P{Xi — 0) — 1 ~ m 

where < Ui < 1 is the utilization of the queue, and > is the service rate 
of the queue time (1 — Ui). For each link delay Xi, we generate a corresponding 
Ui from a uniform distribution in the interval [0.3,0.7], and Vi from an exponential 
distribution with mean 3. To obtain our estimates, we first model each link delay 
by a piecewise uniform distribution with an exponential tail, using 10 bins placed 
at quantiles of the distributioijf]. The estimates of the link delay distributions based 
on ID-projections of Y are obtained by ([3]), where a Gaussian weight function, with 
standard deviation of 5 after normalizing each projection, is used for /i to calculate 
the L2 distance and the integrals are approximated by using Monte-Carlo methods. 
The estimates based on 2D-projections are computed similarly. 

In each simulation run, a total of 1000 delay samples are generated for each link 
in the four-leaf tree from its specified delay distribution and then the end-to-end 
delays are computed. For each of the seven link delay distributions, we consider 
three estimators: 9c f by using the correlation based ID-projections as in 9c f 
using K — I — 7 random ID-projections adjusted for the covariance of Y, and the 
characteristic function based estimator using all pairwise 2D-projections by [16]. 
Figure [S] plots the cumulative distributions of the estimated link delays along with 
the generated true distribution for one simulation run. From the figure, we observe 
that all estimators yield satisfactory results. 

To measure the accuracy of the estimates, we use the Mallows' distance [13] 
defined for a cumulative distribution distribution F and its estimate F by 



where F~^ and F~^ are the inverse cumulative distributions. The Mallows' dis- 
tance can be viewed as the average of absolute difference in quantiles between two 
distributions. Because the Mallows' distance is linear to the scale of distributions, 
we use M{F,F)/ap, the normalized Mallows' distance, to measure the difference 
between F and the estimate F, where ap is the standard deviation of F. 

We repeat the simulation 100 times and compute the normalized Mallows dis- 
tance between the estimated and the simulated distributions for all links. Figure [7] 
reports the median of the normalized Mallows' distance between the three estimates 
and the generated true distributions for each link. We observe that the estimator 
using 2D-projections yield best results overall, the estimator using the correlation 
based ID-projections is a close second, and that using random ID-projections per- 
forms the worst. This indicates that although the correlation based ID-projections 
may not be the optimal directions, the information loss using these ID-projections 
as compared to all pairwise 2D-projections is not significant. 



■^The quantiles are unknown in real applications, but they are assumed to be predetermined 
hero for simplicity. Otherwise, they can be estimated through iterations of the estimation process 
with an initial value. This has been suggested in Q as a strategy for placing bins for modeling 
the link delays and has been shown to yield accurate estimators. 





60 



A. Chen and J. Cao 




Fig 7. Medians of the normalized Mallows' distance for the three link delay distribution estimates 
from 100 simulation runs, where each simulation run the same setup as in Figure\^ 



5. Conclusion and future work 



This paper proposes a one-dimensional linear projection method for solving a class 
of linear inverse problems in network tomography. For the simple Gaussian tomog- 
raphy model, the optimal set of ID-projections is derived and it is shown that 
a likelihood based estimator based on these ID projections is asymptotically as 
efficient as the usual maximum likelihood method. For more realistic models in 
network tomography, simulation studies show that the estimators derived from the 
marginal distributions or marginal characteristic functions of ID-projections per- 
form well for large sample sizes. Future work includes generalization of the optimal 
design of ID-projections for non-Gaussian tomography models and small sample 
studies of the proposed method. 
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